LGEO2185: Efficient Spatial Data I/O – Assignment

Author

Kristof Van Oost, Antoine Stevens & Valentin Charlier

Assignment

  • Query a Microsoft Planetary Computer STAC catalog for a satellite dataset of your choice.

  • Create a monthly time-series for a given region.

  • Apply an unsupervised kmean algorithm to the image to classify

  • Visualize the output and interpret

1. Download data & create monthly time series

Let’s zoom in an agriculture area in Belgium (Hesbaye)1

# Define parameters
aoi <- c(xmin = 4.612999, ymin = 50.638113, xmax = 4.732475, ymax = 50.576054)  # define area of interest in Hesbaye
t0 <- "2020-01-01" # start date
t1 <- "2024-12-31" # end date
toi <- paste(t0, t1, sep = "/") # time of interest

# Search collection
items <- stac("https://planetarycomputer.microsoft.com/api/stac/v1") %>%
    stac_search(
        collections = "sentinel-2-l2a",
        bbox = c(aoi["xmin"], aoi["ymin"], aoi["xmax"], aoi["ymax"]),
        datetime = toi
    ) %>%
    ext_filter(`eo:cloud_cover` < 10) %>% 
    post_request() %>%
    items_sign(sign_fn = sign_planetary_computer())

# number of matches
paste("Number of matches: ", items_length(items))
[1] "Number of matches:  100"
# Create collection
s2_collection <- stac_image_collection(items$features)

Get a 30x30m monthly average of NDVI of our study area

aoi_utm <- st_as_sfc(st_bbox(aoi, crs = 4326)) %>%
    st_transform("EPSG:32632") %>% # to UTM32N 
    st_bbox()

v <- cube_view(
    srs = "EPSG:32632",
    dx = 30, dy = 30,
    dt = "P1M", # Time aggregation, here 1 month (ISO 8601 duration format)
    aggregation = "median", # defining how to deal with pixels containing data from multiple images
    resampling = "average",
    extent = list(
        t0 = t0,
        t1 = t1,
        left = aoi_utm["xmin"],
        right = aoi_utm["xmax"],
        bottom = aoi_utm["ymin"],
        top = aoi_utm["ymax"]
    )
)

gdalcubes_options(parallel = 8)
r <- raster_cube(s2_collection, v) %>%
    select_bands(c("B04","B08")) %>%
    apply_pixel("(B08-B04)/(B08+B04)", "NDVI") %>% # calculate NDVI on the fly
    write_ncdf(file.path("data", "processed", "ndvi_hesbaye.nc"), overwrite = T)

2. Classify

Let’s explore unsupervised classification. In this approach, we rely solely on the spectral (NDVI) data, without providing any pre-assigned class labels or ground truth. This method can seem counterintuitive at first—after all, we’re not telling the algorithm which pixels belong to which land cover type—but it proves extremely valuable when detailed prior knowledge or ground data is scarce.

Various unsupervised classification algorithms exist, each with its own strengths and weaknesses. Here, we illustrate the general principle using the k-means algorithm. K-means works by grouping pixels into clusters based on the similarity of their NDVI time series. Essentially, the algorithm identifies patterns in the data and groups together pixels that share similar spectral characteristics.

This unsupervised approach is particularly useful when:

  • Limited ground data is available: It allows for exploratory analysis even when specific class labels are unknown.
  • General trends are of interest: Even if you have a broad idea of the distribution of land cover types, unsupervised classification can reveal natural groupings and patterns in the data.

By clustering the pixels, we can derive valuable insights into the spatial and temporal variations in vegetation, which can then serve as a basis for further analysis or more targeted supervised classification.

# Load the multi-temporal NDVI cube (each layer is one month)
ndvi_ts <- rast(r)
time(ndvi_ts) <- seq(as_date(t0),as_date(t1),by = "month") # somehow we lost the dates, so I set this here

# Remove layers having all na's
ndvi_ts <- ndvi_ts[[terra::global(ndvi_ts,"anynotNA")]]

# Set seed for reproducibility and define the number of clusters
set.seed(123)
k <- 10  # number of clusters

# Very cool, applying kmeans to a raster is just 1 command!
classification <- k_means(ndvi_ts, centers = k, iter.max = 100, nstart = 10)
classification
class       : SpatRaster 
size        : 247, 296, 1  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 189414.2, 198294.2, 5611371, 5618781  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 32N (EPSG:32632) 
source(s)   : memory
name        : lyr1 
min value   :    1 
max value   :   10 

This returns a spatRaster! Let’s see what we’ve got…

4. Visualize

# Plot NDVI
tm_shape(ndvi_ts[[1]])+
  tm_raster(col.legend = tm_legend("NDVI"), 
            palette = "Greens") +
  tm_layout(legend.outside=TRUE)
# Plot distribution of clusters using tmap
tm_shape(classification) +
  tm_raster(title = "Cluster", palette = "Set3", style = "cat") +
  tm_layout(main.title = "Spatial Distribution of NDVI Time Series Clusters",
            legend.outside = TRUE)
# Create a SpatRaster with a layer for each class 
clusters <- segregate(classification,other = NA)

# Mask each NDVI by cluster class and get the mean NDVI
ndvi_mean <- purrr::map_dfr(1:k,function(x){
                                  df <- tibble(global(ndvi_ts*clusters[[x]],
                                                      "mean",na.rm=T))
                                  df$time <- time(ndvi_ts)
                                  df
                                  },
                            .id = "cluster")
# Plot until 2022
ggplot(filter(ndvi_mean,time <= as_date("2022-01-01")) ,
       aes(x = time,y = mean, col = cluster)) + 
  geom_line() +
  theme_minimal()

We can see rapidly that most likely clusters 2 & 8 are the same rotation, with peaks in the summer in 2020 and spring in 2021, while cluster 1, 7 & 10 display the reverse sequence. Clusters 4, 5 & 9 are built-up areas.

Session info

sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Brussels
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3 gdalcubes_0.7.3    tmap_4.2           rstac_1.0.1       
 [5] arrow_23.0.1.1     sfarrow_0.4.1      geodata_0.6-6      terra_1.9-1       
 [9] lubridate_1.9.5    forcats_1.0.1      stringr_1.6.0      dplyr_1.2.0       
[13] purrr_1.2.1        tidyr_1.3.2        tibble_3.3.1       ggplot2_4.0.2     
[17] tidyverse_2.0.0    tictoc_1.2.1       sf_1.1-0           readr_2.2.0       

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        farver_2.1.2            S7_0.2.1               
 [4] fastmap_1.2.0           leaflegend_1.2.1        leaflet_2.2.3          
 [7] pacman_0.5.1            XML_3.99-0.22           digest_0.6.39          
[10] timechange_0.4.0        lifecycle_1.0.5         magrittr_2.0.4         
[13] compiler_4.5.2          rlang_1.1.7             tools_4.5.2            
[16] yaml_2.3.12             data.table_1.18.2.1     knitr_1.51             
[19] labeling_0.4.3          htmlwidgets_1.6.4       curl_7.0.0             
[22] bit_4.6.0               sp_2.2-1                classInt_0.4-11        
[25] abind_1.4-8             KernSmooth_2.23-26      withr_3.0.2            
[28] leafsync_0.1.0          grid_4.5.2              cols4all_0.10          
[31] e1071_1.7-17            leafem_0.2.5            colorspace_2.1-2       
[34] spacesXYZ_1.6-0         scales_1.4.0            cli_3.6.5              
[37] rmarkdown_2.30          crayon_1.5.3            generics_0.1.4         
[40] otel_0.2.0              httr_1.4.8              tzdb_0.5.0             
[43] tmaptools_3.3           ncdf4_1.24              DBI_1.3.0              
[46] proxy_0.4-29            stars_0.7-1             parallel_4.5.2         
[49] assertthat_0.2.1        s2_1.1.9                base64enc_0.1-6        
[52] vctrs_0.7.1             jsonlite_2.0.0          hms_1.1.4              
[55] bit64_4.6.0-1           jpeg_0.1-11             crosstalk_1.2.2        
[58] jquerylib_0.1.4         maptiles_0.11.0         units_1.0-0            
[61] lwgeom_0.2-15           glue_1.8.0              leaflet.providers_2.0.0
[64] codetools_0.2-20        stringi_1.8.7           gtable_0.3.6           
[67] raster_3.6-32           logger_0.4.1            pillar_1.11.1          
[70] rappdirs_0.3.4          htmltools_0.5.9         R6_2.6.1               
[73] wk_0.9.5                evaluate_1.0.5          lattice_0.22-7         
[76] png_0.1-8               class_7.3-23            Rcpp_1.1.1             
[79] xfun_0.56               pkgconfig_2.0.3        

Footnotes

  1. You can choose one rapidly using bboxfinder↩︎